avril 12, 2020

An Approach to Computing Predicted Purchase - Part 3/3

Background:

If you work with transactional data, you probably have faced the challenge of computing meaningful KPIs that allow you to somehow quantify how promising a given customer is / will be (based on past data of course). One computation that comes in handy when evaluating customers is the predicted purchase probability, i.e., a value that allows you to assess whether it might be worth investing in this customer, from a marketing perspective.

Given the data, the predicted purchase is a probabilistic quantification of future purchase amount, for a given customer

In [157]:
from IPython.display import Image
PATH = "/home/julien/data_blog/post_material/predictive_purchase/"
Image(filename = PATH + "purchase_history.png", width=500, height=500)
Out[157]:

Now there are many ways one could proceed to compute probability of purchase. In this post, I will use one method that I find pretty elegant, discussed in a paper publised by Fader and colleagues in 2005.

Important note
While I used this model in a business project, and it produced useful results that went into production, I am not an expert on financial computations, so take this as it is: an honest attempt to compute a useful metric!

This post is structured in two separate sections:
1) generating transactional data randomly, and making a few observations about what influences the fit of your model
2) using real data from transactions of CDs (feeling some nostalgia there)

Import libraries

We first start by importing required libraries, and setting a few parameters for the notebook that I like to use. If you do not have these libraries installed, simply install them in your environment using pip install , or check the documentation online.

In [162]:
import os

import numpy as np

import pymc3 as pm

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import plotnine as p9

import pandas as pd
pd.set_option("display.max_rows", 120)
pd.set_option("display.max_columns", 120)

display_settings = {
    'max_columns': 10,
    'expand_frame_repr': True,  # Wrap to multiple pages
    'max_rows': 50,
    'precision': 2,
    'show_dimensions': True
}

for op, value in display_settings.items():
    pd.set_option("display.{}".format(op), value)

from IPython.display import clear_output
import timeit

# Set figure size for easier reading
plt.rcParams['figure.figsize'] = (12,6)

Import a typical transactional dataset

In [96]:
from lifetimes.datasets import load_dataset

df = load_dataset(
    'CDNOW_sample.txt',
    header=None,
    names=['customer_id', 'customer_index','transaction_date', 'quantity', 'amount'], 
    delim_whitespace = True,
    #converters=['date': lambda d: pd.to_datetime(d, format='%Y%m%d')]
).drop(columns={'customer_index'})

df.head(2)
Out[96]:
customer_id transaction_date quantity amount
0 4 19970101 2 29.33
1 4 19970118 2 29.73

2 rows × 4 columns

This is a transactional dataset from a business allowing continous purchase of items (CDs). We have access to three data features: (i) date of transaction, (ii) quantity purchased and (iii) customer ID. That is all you need to get started.
In order to train the model, you need to boil down a dataframe at the customer level with three parameters:
1) frequency: number of purchase for that customer.
2) recency: time between first and last purchas, in days, weeks or any meaningful scale for your business.
3) T: age of the customer (birth = first purchase)

This would be a so-called RFM table (Recency-Frequency-Monetary; note that I will be discussing the monetary component in a separate post).

In [164]:
from lifetimes.datasets import load_transaction_data

from lifetimes.utils import summary_data_from_transaction_data
rfm = summary_data_from_transaction_data(transaction_data, customer_id_col = 'id', datetime_col='date')
rfm.sample(3)
Out[164]:
frequency recency T
id
1153 0.0 0.0 330.0
3448 0.0 0.0 178.0
4353 4.0 101.0 169.0

3 rows × 3 columns

We can see that customer # 2446 performed two purchases, that his first purchase happened 332 ago, and that there was a delay of 177 between his first and last purchase.
The drawing below depicts the meaning of each parameter in the RFM table.

In [165]:
PATH = "/home/julien/data_blog/post_material/predictive_purchase/"
Image(filename = PATH + "frequency_recency.png", width=500, height=500)
Out[165]:

We can have a quick look at these features, see whether the distributions make sense

In [166]:
pd.plotting.scatter_matrix(rfm, hist_kwds={'bins':50});

A few observations:
(i) As expected from the data features, we can see that the majority of customers purchased very few times (frequency).
(ii) As a results of course, the recency feature has a major pic at 0, which contains all customers that purchased only one time from that business.
(iii) The business had some stability in acquiring new customers, as suggested by the uniformity of the age (T) distribution

This was an example dataset of how transactional data might look like.

Now a very simple and easy to implement approach would be to calculate a so-called RFM score at the customer level. Data points from each data feature are categorized according to the distribution's quartile of that feature, which provides a score for each customer (in our case only RF score, since we do not have access to monetary value data).

In [167]:
quantiles = rfm.quantile(q=[0.25,0.5,0.75])
quantiles = quantiles.to_dict()

segmented_rfm = rfm.copy()
In [169]:
def RScore(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
def FMScore(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1
In [177]:
# Categorize data points according to quartile
segmented_rfm = segmented_rfm.assign(r_quartile = segmented_rfm['recency']
                                     .apply(RScore, args=('recency',quantiles)))
segmented_rfm = segmented_rfm.assign(f_quartile = segmented_rfm['frequency']
                                     .apply(FMScore, args=('frequency',quantiles)))
# Create RS score per customer
segmented_rfm['RFMScore'] = (segmented_rfm.r_quartile.map(str) 
                           + segmented_rfm.f_quartile.map(str) 
                            )
segmented_rfm.head()
Out[177]:
frequency recency T r_quartile f_quartile RFMScore
id
0 0.0 0.0 298.0 1 4 14
1 0.0 0.0 224.0 1 4 14
2 6.0 142.0 292.0 4 1 41
3 0.0 0.0 147.0 1 4 14
4 2.0 9.0 183.0 3 2 32

5 rows × 6 columns

While there is some virtue to this method (for instance interpretability, simplicity and speed of implementation), one might wonder whether it might be wise to assume that past behavior maps onto future behavior that straighforwardly.

Instead, we could use machine learning to predict purchase base on past purchase pattern for each customer.

Pareto/NBD vs BetaGeo/NBD

From an historical perspective, the Pareto/NBD model has been an attractive model to compute predicted purchase in different business models. However, the complexity of the Pareto/NBD model made it quite complex for businesses to implement it for regular update of predictive purchase. As an alternative, Fader & colleagues proposed the BetaGeo/NBD model, which produces arguably comparable fit to the Pareto, but is simpler in implementation.

We will first focus on the BG/NBD model and test its implementation on a public dataset. In the follow-up post, I will use the Pareto/NBD model on the same dataset, and explore it from another perspective using hierarchical Bayesian modeling to predict purchases.

In [179]:
from lifetimes import BetaGeoFitter
bgf = BetaGeoFitter(penalizer_coef=0.0)

Assumptions of the BG/NBD model (Fader et al., 2005)

(i) While active, the number of transactions made by a customer follows a Poisson process with transaction rate lambda.
This is equivalent to assuming that the time between transactions is distributed exponential with transaction rate $\lambda$


\begin{equation*} \left(\ f(t_j | t_j-1 ; \lambda \right) = \left( \ \lambda e{}^{-\lambda(t_j - t_j-1) } \right) \left( t_j ; t_j-1 > 0\right) \end{equation*}

(ii) Heterogeneity in lambda follows a gamma distribution

\begin{equation*} \left( \ f(\lambda | r, \alpha \right) = \left( \ \frac{\alpha^r \lambda^{r-1} e^{-\lambda \alpha}}{\Gamma r} \right) \left(\lambda > 0\right) \end{equation*}

(iii) After any transaction, a customer becomesinactive with probabilityp. Therefore the point at which the customer “drops out” is distributed across transactions according to a (shifted) geometric distribution

\begin{equation*} \left( \ p(1-p)^{j-1}\right) \left(\ j = 1, 2, 3 ...\right) \end{equation*}

(iv) Heterogeneity inpfollows a beta distribution

\begin{equation*} \left(\ f(p | a,b \right) = \left( \ \frac{p^{a-1} (1-p)^{b-1}}{B(a,b)} \right) \left( 0 \leq p \leq 1\right) \end{equation*}

(v) The transaction rate lambda and the dropout probability p vary independently across customers

Let's check whether the model has an acceptable performance to predict data. To do so, we will train the model on a training set and ask the model to predict data from a test set (AKA validation / holdout period).

You could do it yourself using SciKit Learn or either manual function, but Lifetimes proposed a utility function to split original transactional data (which is often the format available at different companies).

In [182]:
from lifetimes.utils import calibration_and_holdout_data
transaction_data = load_transaction_data()

calibration_period_end='2014-09-01'
observation_period_end='2014-12-31'

summary_cal_holdout = calibration_and_holdout_data(transaction_data, 'id', 'date',
                                                   calibration_period_end=calibration_period_end,
                                                   observation_period_end=observation_period_end)
summary_cal_holdout.head(2)
Out[182]:
frequency_cal recency_cal T_cal frequency_holdout duration_holdout
id
0 0.0 0.0 177.0 0.0 121
1 0.0 0.0 103.0 0.0 121

2 rows × 5 columns

In [183]:
# Fit model to the data
In [184]:
bgf.fit(summary_cal_holdout['frequency_cal'], 
        summary_cal_holdout['recency_cal'], 
        summary_cal_holdout['T_cal']
       )
Out[184]:
<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.96, alpha: 2.26, b: 3.48, r: 0.19>

From the output above, we see that the function 1) splits the data into two sets and 2) produces a so-called RF table.

We run some standard diagnostic, where we plot the average number of transactions in the test set vs training set, for observed ("frequency_holdout") and predicted data ("model predictions")

In [185]:
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases
plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)

# Age in training set
age_train = np.mean(summary_cal_holdout.T_cal)
# Age in testing set is the difference between the two time points taken to define test set
age_test = time_window = (pd.to_datetime(calibration_period_end).week 
                        - pd.to_datetime(observation_period_end).week)
# X axis limit from observed output
xlim=[0,7]
x_axis_no_model = np.linspace(xlim[0],xlim[1], 100)
no_model_line = (age_test/age_train) * x_axis_no_model

plt.plot(x_axis_no_model, no_model_line, linestyle='--', color='black', label='no model')
Out[185]:
[<matplotlib.lines.Line2D at 0x7fdd45c5bf90>]

We can see that the model is quite good in picking up the purchase dynamics.
Most customers make low number of purchases, which is probably why we get a better fit at lower bins, compared to higher bins.
Importantly, if we plot what an estimation of future purchase might be without the use of a model (or rather, using simple heuristics to estimate that value), we would be overestimating purchases, which would impact our business in the future.

Now that we have validated the model using a validation set, let's train the model on the whole dataset.

In [186]:
from lifetimes import BetaGeoFitter
bgf = BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(data['frequency'], data['recency'], data['T'])
print(bgf)
<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>
In [187]:
# Model summary
bgf.summary
Out[187]:
coef se(coef) lower 95% bound upper 95% bound
r 0.16 3.99e-03 0.16 0.17
alpha 1.86 9.54e-02 1.68 2.05
a 1.85 1.30e-01 1.60 2.10
b 3.18 2.94e-01 2.61 3.76

4 rows × 4 columns

Lifetime provides some nice matrix plots, that allow you to visualize the probabilitz of customers being alive given the parameters, and some other features of your dataset. I won't use them here but I recommend you explore these plots, they might have an added value when explaining the process to stakeholders.

Now that our model has been validated and fited on the whole dataset, we can go on and predict the predictive purchase value of the customers in the dataset, for a given time window t:

In [190]:
# Predicted purchases are calculated for a given time window t
t = 12
rfm['pred_purch_coming_week=' + str(t)] = (bgf.conditional_expected_number_of_purchases_up_to_time
                                               (
                                                t, 
                                                rfm['frequency'], 
                                                rfm['recency'], 
                                                rfm['T']
                                               )
                                              )
In [195]:
# Plot the predictive purchase histogram
rfm['pred_purch_coming_week=' + str(t)].plot(kind='hist',bins=50)
plt.xlabel('proba of number of purchases in coming weeks=' +str(t))
plt.title('histo, customer count per proba of purchase');
In [126]:
from pymc3.math import exp, log

This is another approach proposed during the PyData Seattle 2017 by Van Dyke & Gauthier, who uses the Bayesian approach "to approximate the full posterior of the model parameters, and gain a nice understanding of the uncertainty around our estimates, not just the average case"

In [139]:
# import theano 
from pymc3.math import exp, log

class ParetoNBD(pm.Continuous):
    """
    Custom distribution class for Pareto/NBD likelihood.
    """
    
    def __init__(self, lambda_, mu, *args, **kwargs):
        super(ParetoNBD, self).__init__(*args, **kwargs)
        self.lambda_ = lambda_
        self.mu = mu
        
    def logp(self, x, t_x, T):
        """
        Loglikelihood function for and indvidual customer's purchasing rate \lambda
        and lifetime \mu given their frequency, recency and time since first purchase.
        """
        
        log_lambda = log(self.lambda_)
        log_mu = log(self.mu)
        mu_plus_lambda = self.lambda_ + self.mu
        log_mu_plus_lambda = log(mu_plus_lambda)
        
        p_1 = x * log_lambda + log_mu - log_mu_plus_lambda - t_x * mu_plus_lambda
        p_2 = (x + 1) * log_lambda - log_mu_plus_lambda - T * mu_plus_lambda
        
        return log(exp(p_1) + exp(p_2))
In [140]:
# Extract data for model following notation from Fader/Hardie
N = rfm.shape[0] # number of customers
x = rfm['frequency'].values
t_x = rfm['recency'].values
T = rfm['T'].values # length of training period

n_draws = 2000

pnbd_model = pm.Model()

with pnbd_model:
    
    # Uninformative priors on model hyperparameters see Polson and Scott 
    # https://projecteuclid.org/download/pdfview_1/euclid.ba/1354024466
    r = pm.HalfCauchy('r', beta=2)
    alpha = pm.HalfCauchy('alpha', beta=2)
    s = pm.HalfCauchy('s', beta=2)
    beta = pm.HalfCauchy('beta', beta=2)
    
    # Gamma prior on purchasing rate parameter lambda
    lambda_ = pm.Gamma('lambda', alpha=r, beta=alpha, shape=N, testval=np.random.rand(N))
    # Gamma prior on lifetime parameter mu
    mu = pm.Gamma('mu', alpha=s, beta=beta, shape=N, testval=np.random.rand(N))

    # Custom distribution for Pareto-NBD likelihood function
    loglikelihood = ParetoNBD("loglikelihood", mu=mu, lambda_=lambda_, observed={'x': x, 't_x': t_x, 'T': T})
    
    # Sample the model
    trace = pm.sample(n_draws, init=None)
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu, lambda, beta, s, alpha, r]
Sampling 4 chains, 0 divergences: 100%|██████████| 10000/10000 [10:19<00:00, 16.15draws/s]
The acceptance probability does not match the target. It is 0.6911664884177412, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.7179295758790892, but should be close to 0.8. Try to increase the number of tuning steps.
The number of effective samples is smaller than 10% for some parameters.
In [141]:
# Traceplots to check for convergence
_ = pm.traceplot(trace, varnames=['r','alpha','s','beta'])
/home/julien/anaconda3/envs/data_blog/lib/python3.7/site-packages/pymc3/plots/__init__.py:21: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))

Key Quantities

Here we provide functions for deriving two key quantities, the probability a customer is still alive at a future period and the expected purchases of a customer in a future period.

Probability customer is still alive at time T

In [150]:
def prob_alive_at_T(lambda_, mu, t_x, T):
    """
    Probability a customer's time of death \tau occurs after T.
    
    Pr(\tau > T \> | \> \lambda, \mu, x, t_x,T) = \dfrac{1}{1+\dfrac{\mu}{\mu+\lambda}
    \big\{e^{(\lambda+\mu)(T-t_x)}-1\big\}}
    
    See expression (7) in technical appendix of Abe.
    
    :param lambda_: lambda parameter at the customer level :
    :type lambda_: scalar
    :param mu: mu parameter at the customer level
    :type mu_: scalar
    :param t_x: recency of transactions
    :type t_x: float
    :param T: duration of the calibration/training period
    :type T: float
    
    :return: probability of being alive at time T.
    """

    den = 1 + (mu / (lambda_ + mu)) * (np.exp((lambda_ + mu) * (T - t_x)) - 1)
    return 1 / den

Cumulative expected purchases for customer in period (T, T+t)

In [151]:
def likelihood(lambda_, mu, x, t, T):
    """Pareto/NBD likelihood function.

    :param lambda_: lambda parameter at the customer-level.
    :param mu: mu parameter at the customer level
    :param x: number of repeat transactions
    :param t: recency
    :param T: length of the calibration/training period.

    :return: likelihood value.
    """

    p1 = x * np.log(lambda_) + np.log(mu) - np.log(mu + lambda_) - t * (mu + lambda_)
    p2 = (x + 1) * np.log(lambda_) - np.log(mu + lambda_) - T * (mu + lambda_)
    return np.exp(p1) + np.exp(p2)

def predict(t, lambda_, mu, x, tx, T):
    """Conditional expected purchases at end of time (T, T+t).

    Used to assess holdout period performance and to make predictions
    for future time period t. Conditional on purchase history (x, t_x, T).

    E[X(T,T+t) \> | \> \lambda, \mu, x, t_x, T] = \dfrac{1}{L(\lambda, \mu)|(x, t_x, T)}
    \times \dfrac{\lambda^{x+1}}{\mu}e^{-(\lambda+\mu)T}(1-e^{-\mu t})

    http://brucehardie.com/notes/034/corr_Pareto-NBD.pdf

    :param t: time period
    :type t: scalar

    :param lambda_: lambda parameter at the customer level :
    :type lambda_: scalar

    :param mu: mu parameter at the customer level
    :type mu_: scalar

    :param x: number of repeat transactions
    :type x: integer

    :param tx: recency of transactions
    :type tx: float

    :param T: duration of the calibration/training period
    :type T: float

    :return expected number of purchases (scalar)
    """
    like = likelihood(lambda_, mu, x, tx, T)
    p2 = lambda_ ** (x + 1) / mu * np.exp(-(lambda_ + mu) * T) * (1 - np.exp(-mu * t))

    return 1 / like * p2
In [152]:
_ = pm.plot_posterior(trace, varnames=['r','alpha','s', 'beta'])
/home/julien/anaconda3/envs/data_blog/lib/python3.7/site-packages/pymc3/plots/__init__.py:21: UserWarning: Keyword argument `varnames` renamed to `var_names`, and will be removed in pymc3 3.8
  warnings.warn('Keyword argument `{old}` renamed to `{new}`, and will be removed in pymc3 3.8'.format(old=old, new=new))
In [153]:
customer_index = 150
# show purchasing behavior
rfm.iloc[customer_index]
Out[153]:
frequency      6.0
recency        9.0
T            218.0
Name: 150, Length: 3, dtype: float64
In [154]:
lambda_post = trace['lambda']
mu_post = trace['mu']

We underforecast the number of purchases for this customer in the holdout period, but we correctly identified that the customer was still alive.

Holdout Period Predictions for Entire Customer Cohort

To get a picture of model performance for the entire

In [156]:
# holdout period is 39 weeks
t = 39
# predictions are size of customer base x number of draws
holdout_predictions = np.empty([N, n_draws])

for i in np.arange(N):
    holdout_predictions[i] = predict(
        t, 
        lambda_post[:,i], 
        mu_post[:,i], 
        x[i], 
        t_x[i], 
        T[i]
    )

# predictions are posterior mean
rfm['frequency_predicted'] = holdout_predictions.mean(axis=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-156-1111a4f997d3> in <module>
     11         x[i],
     12         t_x[i],
---> 13         T[i]
     14     )
     15 

ValueError: could not broadcast input array from shape (8000) into shape (2000)

In the holdout period we predicted 1,724 purchases and observed 1,788.

In [355]:
rfm[['frequency_holdout','frequency_predicted']].sum()
Out[355]:
frequency_holdout      1788.000000
frequency_predicted    1724.684102
dtype: float64

The plot below shows the average purchases in the holdout period grouped by the number of repeat purchases made in the calibration period.

In [363]:
xlim=(1, 20)
mean_frequencies = rfm.groupby('frequency_cal')[['frequency_holdout',
                                     'frequency_predicted']].mean()
mean_frequencies.rename(columns={'frequency_holdout': 'Observed',
                                 'frequency_predicted': 'Model'},
                        inplace=True)
mean_frequencies.plot(kind='line',
                      title='Conditional Expected Purchases in Holdout Period', figsize=(16, 8))

# Generate a dummy model with holdout freq = t_holdout/t_calib
t_calib = np.mean(rfm['T_cal'])
t_holdout = t
x_heuristics = np.linspace(xlim[0],xlim[1],100)
heuristics = t_holdout/t_calib * x_heuristics

plt.plot(x_heuristics, heuristics, linestyle='--', color='black', label='Simple Heuristics')
plt.legend(loc=0)
plt.xlim(xlim)
plt.ylabel('Mean Number of Transactions in Holdout Period')
plt.xlabel('Number of Transactions in Calibration Period')
plt.grid(False)
plt.show()
In [364]:
from sklearn.metrics import mean_squared_error
print("RMSE: %s" % np.sqrt(mean_squared_error(rfm['frequency_holdout'], rfm['frequency_predicted'])))
RMSE: 1.41695547193
In [155]:
# Select distributions of lambda and mu for a customer
lambda_individual = lambda_post[:,customer_index]
mu_individual = mu_post[:,customer_index]

# predict purchases for the user at t = 39
t = 39
predict(t, lambda_individual, mu_individual, x[customer_index], t_x[customer_index], T[customer_index]).mean()
Out[155]:
9.402268808466233e-08
In [ ]:
 

avril 02, 2020

An Approach to Computing Predicted Purchase - Part 2/3

Background:

This post is a follow up to Computing Predicted Purchase I.
If you have not read it yet, I suggest you skip through the initial post before reading on.

In this post, I will be playing around with transactional data to see how that influence the fit of the BetaGeo model

Import libraries

We first start by importing required libraries, and setting a few parameters for the notebook that I like to use. If you do not have these libraries installed, simply install them in your environment using pip install , or check the documentation online.

In [1]:
import os

import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import plotnine as p9

import pandas as pd
pd.set_option("display.max_rows", 120)
pd.set_option("display.max_columns", 120)

display_settings = {
    'max_columns': 10,
    'expand_frame_repr': True,  # Wrap to multiple pages
    'max_rows': 50,
    'precision': 2,
    'show_dimensions': True
}

for op, value in display_settings.items():
    pd.set_option("display.{}".format(op), value)

from IPython.display import clear_output
import timeit

# Set figure size for easier reading
plt.rcParams['figure.figsize'] = (16,8)

As a reminder, the assumptions of the BetaGeo model below. This will be important since we will vary some distributions of the data features to see how that affect the model fit

Assumptions of the BG/NBD model (quoting from Fader et al., 2005)

(i) While active, the number of transactions made by a customer follows a Poisson process with transaction rate lambda.
This is equivalent to assuming that the time between transactions is distributed exponential with transaction rate $\lambda$


\begin{equation*} \left(\ f(t_j | t_j-1 ; \lambda \right) = \left( \ \lambda e{}^{-\lambda(t_j - t_j-1) } \right) \left( t_j ; t_j-1 > 0\right) \end{equation*}

(ii) Heterogeneity in lambda follows a gamma distribution

\begin{equation*} \left( \ f(\lambda | r, \alpha \right) = \left( \ \frac{\alpha^r \lambda^{r-1} e^{-\lambda \alpha}}{\Gamma r} \right) \left(\lambda > 0\right) \end{equation*}

(iii) After any transaction, a customer becomesinactive with probabilityp. Therefore the point at which the customer “drops out” is distributed across transactions according to a (shifted) geometric distribution

\begin{equation*} \left( \ p(1-p)^{j-1}\right) \left(\ j = 1, 2, 3 ...\right) \end{equation*}

(iv) Heterogeneity inpfollows a beta distribution

\begin{equation*} \left(\ f(p | a,b \right) = \left( \ \frac{p^{a-1} (1-p)^{b-1}}{B(a,b)} \right) \left( 0 \leq p \leq 1\right) \end{equation*}

(v) The transaction rate lambda and the dropout probability p vary independently across customers

Before we run the model, we need to validate that the model has an acceptable performance to predict data. To do so, we will train the model on a training set and ask the model to predict data from a test set (AKA validation / holdout period).

Now let's see if we can reproduce the results with generated data. As we saw, typical transactional data will provide features such as transaction date, quantity purchased, generated turnover (sometimes with other monetary KPIs such as margin or order value), and of course a customer ID so that transactions can be brought down to the customer level

Let's generate that data ourselves!

Let's say we want gathered data from a number of customers = nCustomers, who performed transactions between 01.01.2018 & 01.01.2020

In [434]:
# Total number of customers in our simulated dataset
nCustomers = 500000
# Start of the transaction period
start_observation_period = '2018-01-01'
# End of the transaction period
end_observation_period   = '2020-01-01'
# Seed NumPy random generator for reproduction purposes
np.random.seed(42)# 

A subset of these customers would have purchases only one, while others would have performed repeated purchases (let's say 50/50)

In [435]:
# Proportion rebuyers
proportion_rebuyers = 0.5

It would be relatively safe to assume that the number of purchases would follow a gamma distribution with a shape favouring small repeated purchases values.=.

In [436]:
gamma_shape = 4

I first generate random customer IDs

In [437]:
max_purchase_number = 40
In [438]:
import random
import string

def generate_customer_random_id(length=10, units = nCustomers, chars=string.ascii_uppercase + string.digits):
    return ["".join([random.choice(chars) for i in range(length)]) for j in range(units)]
In [439]:
df_transactions = pd.DataFrame()
df_transactions = (df_transactions
                    .assign(customer_id = generate_customer_random_id(length = 10, 
                                                                      units  = nCustomers
                                                                     )
                            )
)

I then define number of purchases per customers according to the parameters defined earlier

In [440]:
# Create logical mask for rebuyers and one time buyers
mask_rebuyers        = np.ones (int(nCustomers*   proportion_rebuyers))
mask_one_time_buyers = np.zeros(int(nCustomers*(1-proportion_rebuyers)))

# Create mask from distribution to create purchaser customers 
purchaser_distribution_gamma = np.random.gamma(shape = gamma_shape,
                                         size=int(nCustomers*proportion_rebuyers)
                                        ).round()

purchaser_distribution_unifo = np.random.uniform(low = 1, 
                                           high = max_purchase_number, 
                                           size = int(nCustomers*proportion_rebuyers)
                                          )
# Add random purchases for some of the rebuyers customers
mask_rebuyers = (mask_rebuyers + purchaser_distribution_unifo)

# Set one time buyers purchase number = 1
mask_one_time_buyers = mask_one_time_buyers + 1

# Concatenate masks
purchase_mask = np.concatenate([mask_rebuyers,
                                mask_one_time_buyers])

# Shuffle mask
np.random.shuffle(purchase_mask)

# Assign the number of purchases as column
df_transactions = df_transactions.assign(number_of_purchases = purchase_mask.astype(int))

df_transactions = df_transactions.loc[df_transactions.index.repeat(df_transactions.number_of_purchases)]
In [441]:
from sklearn.utils import shuffle
df_transactions = shuffle(df_transactions)

In a third step, I generate random dates

In [442]:
def generate_random_dates(start, end, n):
    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.DatetimeIndex((10**9*np.random.randint(start_u, end_u, n, dtype=np.int64)))
In [443]:
# Generate random dates
random_dates = generate_random_dates(start = pd.to_datetime(start_observation_period),
                                     end = pd.to_datetime(end_observation_period), 
                                     #n = np.floor((df_transactions.shape[0]/1000)).astype(int))
                                     n = (df_transactions.shape[0]))
                                      

df_transactions = df_transactions.assign(receipt_date = np.random.choice(random_dates, 
                                                                         size=df_transactions.shape[0],
                                                                         replace=True))
                                            

Finally, let's add some monetary data, such as the turnover generated by receipt.
Here, I assume that the data has been corrected for retours, but that should of course be checked before hand

In [444]:
df_transactions = df_transactions.assign(monetary_value = np.random.gamma(shape=10, size=df_transactions.shape[0])) 
                                         #np.random.rand(df_transactions.shape[0])*100)
In [445]:
df_transactions.sample(20)
Out[445]:
customer_id number_of_purchases receipt_date monetary_value
470538 JLYXYLAQ9J 24 2018-06-17 12:44:57 6.76
240870 U2V0Y3ZCFR 40 2019-09-27 04:15:01 6.77
217862 CQ9J9TPZEL 22 2018-05-22 16:15:00 10.61
73286 IOGTBGF999 14 2018-06-17 22:50:23 11.41
299364 RWIF1M14WD 27 2018-05-16 18:32:46 13.85
57267 842066K27J 31 2018-02-15 13:34:54 9.29
156618 HTL6JTDVWS 13 2019-11-15 09:18:50 15.07
171663 TIH033J7XY 38 2019-06-08 01:07:31 9.05
46784 XX6TVXTJCU 22 2018-04-07 19:35:01 8.75
25661 HWDQKFNVMJ 37 2018-10-13 11:04:33 10.16
128847 0Y96Q1FI8G 39 2019-06-18 08:51:21 14.61
6304 NS3I74OCXU 40 2018-06-24 14:35:59 9.95
5166 L8OZMCPS0V 39 2019-05-31 13:50:35 13.97
75466 YWSGSMZYSH 22 2019-12-29 11:47:48 6.39
188238 ZM9A7VXHBQ 35 2019-09-17 03:28:18 4.95
59653 A3PT83UTTV 28 2019-08-31 00:53:29 7.67
337507 5YJ8RL86SR 38 2019-04-28 13:17:51 11.07
269664 UM4K1HI7A7 25 2018-03-12 00:50:37 10.95
380608 U9LIDX8IKB 35 2019-10-05 08:51:50 7.72
180654 YTUCC36C1H 36 2019-01-21 15:40:22 4.45

20 rows × 4 columns

Good, now we have some dataset, generated randomly (pseudo-random actually, since we seeded the random generator) on which we can work on, to predict purchases at the customer level.

Preparing the data for an ML approach

We are going to need a few data features currently contained, but not isolated in the dataset. Let's isolate them.

I typically code my munging steps into separate functions, and them pipe the data through each one of them.

In [446]:
# Munging pipeline
# Start munging pipeline by copying the dataframe
def start_pipeline(dataf):
    dataf = dataf.copy()
    return dataf

# Format columns
def format_cols(dataf):
    dataf.columns = [c.lower() for c in dataf.columns]
    dataf.columns = [c.replace(' ', '_') for c in dataf.columns]
    return dataf

def correct_timestamps(dataf):
    return (dataf
     .assign(receipt_date = pd.to_datetime(dataf['receipt_date'],format="%Y/%m/%d"))
            )

def first_last_receipt_dates(dataf):
    dataf["first_receipt_date"] = dataf.groupby('customer_id').receipt_date.transform('min')
    dataf["last_receipt_date"]  = dataf.groupby('customer_id').receipt_date.transform('max')
    return dataf

def compute_age_recency(dataf,timestamp_ref='2020-01-01T00'):
    return (dataf
    .assign(age_in_weeks     = lambda d: (pd.Timestamp(timestamp_ref)# pd.to_datetime("now") 
                                        - d['first_receipt_date']).dt.days / 7)
    .assign(recency_in_weeks = lambda d: (d['last_receipt_date'] 
                                        - d['first_receipt_date']).dt.days / 7))
In [447]:
df_ml = (df_transactions
.pipe(start_pipeline)
.pipe(format_cols)            
.pipe(correct_timestamps)
.pipe(first_last_receipt_dates)
.pipe(compute_age_recency, timestamp_ref = '2020-01-01')
)

df_ml.head(10)
Out[447]:
customer_id number_of_purchases receipt_date monetary_value first_receipt_date last_receipt_date age_in_weeks recency_in_weeks
219477 ZIDYF1MSHE 31 2018-04-18 17:30:40 10.57 2018-01-08 14:52:29 2019-12-23 08:03:28 103.14 101.86
84815 W3VQVR9I4D 35 2018-03-25 09:08:13 9.55 2018-01-06 02:06:10 2019-12-27 04:54:51 103.43 102.86
372130 DO1X1YLIY0 12 2019-07-21 10:34:56 10.01 2018-01-25 12:23:00 2019-12-30 00:24:17 100.71 100.43
390705 FLQQMRRBX6 26 2018-05-21 09:57:27 14.76 2018-02-03 19:39:31 2019-11-26 23:02:02 99.43 94.43
445969 6O16E3I3SQ 17 2019-12-04 14:20:04 6.70 2018-01-29 19:24:13 2019-12-23 17:48:35 100.14 98.86
59846 1IS7D4ABWG 31 2018-03-20 08:59:21 7.18 2018-02-06 17:13:42 2019-12-31 05:26:29 99.00 98.86
23901 95AOUVLHTN 1 2019-12-06 00:54:44 6.43 2019-12-06 00:54:44 2019-12-06 00:54:44 3.57 0.00
349319 KYQVG5V7L0 26 2019-10-18 12:51:43 16.58 2018-02-02 12:43:51 2019-12-30 02:05:51 99.57 99.29
471841 I6MOPSPPYS 36 2018-03-22 05:56:18 6.65 2018-01-10 07:59:26 2019-12-19 19:18:37 102.86 101.14
473977 UNCVK9EDDM 14 2019-02-15 14:29:55 9.39 2018-01-03 22:01:10 2019-12-22 16:08:21 103.86 102.43

10 rows × 8 columns

In [448]:
df_ml.loc[df_ml.customer_id == 'VQCUZ3RHCU']
Out[448]:
customer_id number_of_purchases receipt_date monetary_value first_receipt_date last_receipt_date age_in_weeks recency_in_weeks

0 rows × 8 columns

In [469]:
df_ml.drop_duplicates(subset=['customer_id']).age_in_weeks.plot(kind='hist', bins=100)
Out[469]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d020c950>

The Beta Geo model for predicting customer purchase

In a paper published in 2005, Fader, Hardie & Lee proposed a modified version of the typically used Pareto model, which had been quite challenging to implement from a computation perspective.

In [450]:
# Import models
from lifetimes import BetaGeoFitter,ParetoNBDFitter
bgf = BetaGeoFitter(penalizer_coef=10.)

#bgf_Pareto = ParetoNBDFitter(penalizer_coef=0.1)
In [470]:
# Verify that the model is doing a relatively good job at predicting values
from lifetimes.utils import calibration_and_holdout_data
# This function creates a summary of each customer over a calibration and holdout period (training and testing)
# It accepts transaction data, and returns a DataFrame of sufficient statistics.
summary_cal_holdout = calibration_and_holdout_data(df_ml, 'customer_id', 'receipt_date',
                                                    calibration_period_end='2019-06-01',
                                                    observation_period_end='2020-01-01',
                                                    freq='W',
                                                   # monetary_value_col = 'monetary_value'
                                                  )
In [471]:
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases

bgf.fit(summary_cal_holdout['frequency_cal'], 
        summary_cal_holdout['recency_cal'], 
        summary_cal_holdout['T_cal'])
plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)

# Model summary
bgf.summary
      fun: -10.942142948033462
 hess_inv: array([[ 2.11215922e+00,  2.12575861e+00,  2.64779153e+03,
        -1.60508036e+03],
       [ 2.12575861e+00,  3.56394351e+00,  2.10560417e+03,
        -1.27629225e+03],
       [ 2.64779153e+03,  2.10560417e+03,  2.08870837e+10,
        -1.26631653e+10],
       [-1.60508036e+03, -1.27629225e+03, -1.26631653e+10,
         7.67726876e+09]])
      jac: array([-5.54077421e-07,  1.20133147e-07,  4.36074647e-12,  0.00000000e+00])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 115
      nit: 42
     njev: 104
   status: 2
  success: False
        x: array([ -0.66699956,  -2.75145715, -17.29566552,  10.31307884])
---------------------------------------------------------------------------
ConvergenceError                          Traceback (most recent call last)
<ipython-input-471-5e69895006f3> in <module>
      3 bgf.fit(summary_cal_holdout['frequency_cal'], 
      4         summary_cal_holdout['recency_cal'],
----> 5         summary_cal_holdout['T_cal'])
      6 plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)
      7 

~/anaconda3/envs/data_blog/lib/python3.7/site-packages/lifetimes/fitters/beta_geo_fitter.py in fit(self, frequency, recency, T, weights, initial_params, verbose, tol, index, **kwargs)
    127             verbose,
    128             tol,
--> 129             **kwargs
    130         )
    131 

~/anaconda3/envs/data_blog/lib/python3.7/site-packages/lifetimes/fitters/__init__.py in _fit(self, minimizing_function_args, initial_params, params_size, disp, tol, bounds, **kwargs)
    117                 """
    118             The model did not converge. Try adding a larger penalizer to see if that helps convergence.
--> 119             """
    120             )
    121         )

ConvergenceError: 
The model did not converge. Try adding a larger penalizer to see if that helps convergence.
In [453]:
# We can visualize one particular customer
from lifetimes.plotting import plot_history_alive

id = df_ml.loc[df_ml.number_of_purchases > max_purchase_number/6].sample(1).iloc[0].customer_id

print(df_ml.loc[df_ml.customer_id == id])
days_since_birth = round((pd.to_timedelta(pd.Timestamp('2020-01-01T00') 
                  - pd.Timestamp((df_ml.loc[df_ml['customer_id'] == id]['receipt_date'].min()))).days) / 7)
plot_history_alive(bgf, 
                   days_since_birth, 
                   df_ml.loc[df_ml['customer_id'] == id], 
                   'receipt_date',
                   #xlim=['2017-11-01T00','2020-01-01T00']
                  )
       customer_id  number_of_purchases        receipt_date  monetary_value  \
461246  H47HL2NVZ1                   38 2018-09-01 02:16:04           12.61   
461246  H47HL2NVZ1                   38 2018-03-29 23:55:34            9.69   
461246  H47HL2NVZ1                   38 2018-04-28 06:12:15           11.80   
461246  H47HL2NVZ1                   38 2018-11-07 04:30:06           10.29   
461246  H47HL2NVZ1                   38 2019-11-12 09:09:56            5.90   
461246  H47HL2NVZ1                   38 2018-11-18 12:09:40           11.57   
461246  H47HL2NVZ1                   38 2018-09-21 14:26:31           12.23   
461246  H47HL2NVZ1                   38 2018-08-30 08:13:59            6.72   
461246  H47HL2NVZ1                   38 2019-01-24 11:57:19           10.31   
461246  H47HL2NVZ1                   38 2018-08-29 19:44:20           16.25   
461246  H47HL2NVZ1                   38 2018-05-14 06:15:19            3.78   
461246  H47HL2NVZ1                   38 2019-07-10 07:28:05            7.22   
461246  H47HL2NVZ1                   38 2018-03-21 19:12:54            6.53   
461246  H47HL2NVZ1                   38 2019-12-03 06:11:47            8.49   
461246  H47HL2NVZ1                   38 2019-11-15 09:47:32            9.09   
461246  H47HL2NVZ1                   38 2018-02-28 17:02:10           13.94   
461246  H47HL2NVZ1                   38 2019-05-30 19:33:07           15.55   
461246  H47HL2NVZ1                   38 2019-05-31 11:06:34           12.89   
461246  H47HL2NVZ1                   38 2018-10-29 22:55:26           17.05   
461246  H47HL2NVZ1                   38 2018-07-30 03:50:28           11.87   
461246  H47HL2NVZ1                   38 2018-06-23 01:02:56            5.83   
461246  H47HL2NVZ1                   38 2019-06-19 03:01:12            7.68   
461246  H47HL2NVZ1                   38 2019-04-10 03:29:05            9.70   
461246  H47HL2NVZ1                   38 2018-10-02 05:54:47           13.47   
461246  H47HL2NVZ1                   38 2019-03-22 02:25:42            7.77   
461246  H47HL2NVZ1                   38 2018-04-07 19:44:13            7.32   
461246  H47HL2NVZ1                   38 2019-01-28 05:42:51            8.87   
461246  H47HL2NVZ1                   38 2019-12-05 01:31:43           11.68   
461246  H47HL2NVZ1                   38 2019-11-13 15:30:21            9.79   
461246  H47HL2NVZ1                   38 2018-06-10 08:59:21            8.74   
461246  H47HL2NVZ1                   38 2019-08-01 22:27:33           10.30   
461246  H47HL2NVZ1                   38 2018-01-05 20:42:01            9.40   
461246  H47HL2NVZ1                   38 2019-02-06 21:59:29           17.74   
461246  H47HL2NVZ1                   38 2019-09-29 19:29:11            8.74   
461246  H47HL2NVZ1                   38 2018-01-27 16:58:28            9.15   
461246  H47HL2NVZ1                   38 2018-06-15 01:47:16           10.73   
461246  H47HL2NVZ1                   38 2018-10-19 23:59:02            9.56   
461246  H47HL2NVZ1                   38 2019-12-20 07:12:59           12.43   

        first_receipt_date   last_receipt_date  age_in_weeks  recency_in_weeks  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  
461246 2018-01-05 20:42:01 2019-12-20 07:12:59        103.57            101.86  

[38 rows x 8 columns]
Out[453]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60a4b2de50>
In [454]:
# Transform transaction data into customer-based data with three attributes
from lifetimes.utils import summary_data_from_transaction_data

summary = summary_data_from_transaction_data(df_ml, 
                                             'customer_id', 
                                             'receipt_date', 
                                             observation_period_end='2020-01-01',
                                             datetime_format = '%Y-%m-%d',
                                             freq='W',
                                             monetary_value_col = 'monetary_value')
summary.sample(2)
Out[454]:
frequency recency T monetary_value
customer_id
0K0PXY11PB 0.0 0.0 74.0 0.00
L8W199LWS8 11.0 83.0 91.0 11.55

2 rows × 4 columns

In [455]:
bgf.fit(summary['frequency'], summary['recency'], summary['T'])
bgf.summary
Out[455]:
coef se(coef) lower 95% bound upper 95% bound
r 1.08e-01 1.90e-04 1.07e-01 1.08e-01
alpha 1.16e+00 5.37e-03 1.15e+00 1.17e+00
a 4.47e-18 1.95e-14 -3.83e-14 3.83e-14
b 1.87e-05 2.24e-04 -4.20e-04 4.57e-04

4 rows × 4 columns

In [456]:
from lifetimes.plotting import plot_frequency_recency_matrix
plot_frequency_recency_matrix(bgf)
Out[456]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d33cbad0>
In [457]:
from lifetimes.plotting import plot_probability_alive_matrix

plot_probability_alive_matrix(bgf)
Out[457]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60c18d6590>
In [458]:
from lifetimes.plotting import plot_period_transactions
plot_period_transactions(bgf)
Out[458]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d03f7e90>
In [360]:
# Add predicted purchases to summary dataframe
# Predicted purchases are calculated for a given time window t
t = 12
summary['pred_purch_coming_week=' + str(t)] = (bgf.conditional_expected_number_of_purchases_up_to_time
                                               (
                                                t, 
                                                summary['frequency'], 
                                                summary['recency'], 
                                                summary['T']
                                               )
                                              )
/home/julien/anaconda3/envs/data_blog/lib/python3.7/site-packages/pandas/core/series.py:679: RuntimeWarning: invalid value encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)
In [361]:
print(summary.sort_values(by='pred_purch_coming_week=' + str(t)).head(5))
summary['pred_purch_coming_week=' + str(t)].plot(kind='hist',bins=100)
plt.xlabel('proba of number of purchases in coming weeks=' +str(t))
plt.title('histo, customer count per proba of purchase');
             frequency  recency      T  monetary_value  \
customer_id                                              
5TX56GVQ7C         0.0      0.0  104.0             0.0   
DRLCZ8JOYT         0.0      0.0  104.0             0.0   
94WYYHNS2J         0.0      0.0  104.0             0.0   
7S5APVDOQK         0.0      0.0  104.0             0.0   
C60QZMO4KW         0.0      0.0  104.0             0.0   

             pred_purch_coming_week=12  
customer_id                             
5TX56GVQ7C                        0.01  
DRLCZ8JOYT                        0.01  
94WYYHNS2J                        0.01  
7S5APVDOQK                        0.01  
C60QZMO4KW                        0.01  

[5 rows x 5 columns]

I need to scale the data because my attributes have uncomparable scales

In [58]:
from sklearn.preprocessing import StandardScaler
to_cluster = summary[['pred_purch_coming_week=' + str(t),'monetary_value']]

# stscaler = StandardScaler().fit(to_cluster)
# to_cluster = stscaler.transform(to_cluster)
to_cluster = to_cluster.dropna()
to_cluster
Out[58]:
pred_purch_coming_week=12 monetary_value
club_card_number
1000044688 0.07 0.0
1000051773 0.09 0.0
1000076175 0.08 0.0
1000081821 0.07 0.0
1000088953 0.08 0.0
... ... ...
9500856806 0.17 0.0
9500867467 0.22 0.0
9500889732 0.20 0.0
9500972324 0.15 0.0
9501041606 0.24 0.0

492703 rows × 2 columns

In [76]:
from sklearn.mixture import GaussianMixture

from sklearn import cluster, mixture
from sklearn.cluster import KMeans

default_base = {'quantile': .3,
                'eps': .3,
                'damping': .9,
                'preference': -200,
                'n_neighbors': 10,
                'n_clusters': 5,
                'min_samples': 20,
                'xi': 0.05,
                'min_cluster_size': 0.1}
params = default_base.copy()

# ============
# Create cluster objects
# ============

#ms = cluster.MeanShift(bin_seeding=True)
#two_means = cluster.MiniBatchKMeans(n_clusters=params['n_clusters'])
#affinity_propagation = cluster.AffinityPropagation(
#    damping=params['damping'], preference=params['preference'])
#birch = cluster.Birch(n_clusters=params['n_clusters'])
#dbscan = cluster.DBSCAN(eps=params['eps'])
gmm = mixture.GaussianMixture(
    n_components=params['n_clusters'], covariance_type='full')
km = KMeans(n_clusters=params['n_clusters'], 
             max_iter=100, 
             init='k-means++', 
             n_init=1,
             random_state=42,
             verbose=0)

clustering_algorithms = (
   #('MiniBatchKMeans', two_means),
     #('AffinityPropagation', affinity_propagation),
    #('MeanShift', ms),
    #('Birch', birch),
    #('DBSCAN', dbscan),
    ('GaussianMixture', gmm))

#for name, algorithm in clustering_algorithms:
gmm.fit(to_cluster)
y_pred = gmm.predict(to_cluster)
label_colors = ["tomato", "grey", "limegreen", "gold", "slateblue", "moccasin"]
colors = [label_colors[i] for i in y_pred]
plt.scatter(to_cluster[:, 0], to_cluster[:, 1], color=colors)
plt.xlabel('Predicted purchases in the coming 4 weeks')
plt.ylabel('Monetary Value in euros')
plt.title('Predicted purchase by MonVal / ' + str(clustering_algorithms[0]));
plt.savefig('cluster_all.png')
In [91]:
unique, counts = np.unique(y_pred, return_counts=True)
label_colors = ["tomato", "grey", "limegreen", "gold", "slateblue"]
print(counts)
print()
unique, counts = np.unique(y_pred, return_counts=True)
print(pd.concat([pd.DataFrame(label_colors), pd.DataFrame(counts)], axis = 1))
[ 34413 239637  47325   5487 157760]

           0       0
0     tomato   34413
1       grey  239637
2  limegreen   47325
3       gold    5487
4  slateblue  157760
In [98]:
counts.sum()
Out[98]:
484622
In [77]:
from sklearn.mixture import GaussianMixture
from sklearn.preprocessing import StandardScaler
from sklearn import cluster, mixture
from sklearn.cluster import KMeans

to_cluster = summary[['pred_purch_coming_week=' + str(t),'monetary_value']]
to_cluster = to_cluster.dropna()

stscaler = StandardScaler().fit(to_cluster)
to_cluster = stscaler.transform(to_cluster)

default_base = {'quantile': .3,
                'eps': .3,
                'damping': .9,
                'preference': -200,
                'n_neighbors': 10,
                'n_clusters': 5,
                'min_samples': 20,
                'xi': 0.05,
                'min_cluster_size': 0.1}
params = default_base.copy()
# ============
# Create cluster objects
# ============
ms = cluster.MeanShift(bin_seeding=True)
two_means = cluster.MiniBatchKMeans(n_clusters=params['n_clusters'])
affinity_propagation = cluster.AffinityPropagation(
    damping=params['damping'], preference=params['preference'])
birch = cluster.Birch(n_clusters=params['n_clusters'])
dbscan = cluster.DBSCAN(eps=params['eps'])
gmm = mixture.GaussianMixture(
    n_components=params['n_clusters'], covariance_type='full')
vbgmm= mixture.BayesianGaussianMixture(
    n_components=params['n_clusters'], covariance_type='full')
km = KMeans(n_clusters=params['n_clusters'], 
             max_iter=100, 
             init='k-means++', 
             n_init=1,
             # Random state is where does the centroids start. 
             # I defined this after looking at different results
             # IMPORTANT is to verify validity of this approach from a conceptual perspective
             random_state=42,
             verbose=0)

clustering_algorithms = (
    ('MiniBatchKMeans', two_means),
    ('AffinityPropagation', affinity_propagation),
    ('MeanShift', ms),
    ('Birch', birch),
    ('DBSCAN', dbscan),
    ('GaussianMixture', gmm))

#for name, algorithm in clustering_algorithms:
km.fit(to_cluster)
y_pred = km.predict(to_cluster)
label_colors = ["tomato", "grey", "limegreen", "gold", "slateblue", "moccasin", "grey"]
colors = [label_colors[i] for i in y_pred]
plt.scatter(to_cluster[:, 0], 
            to_cluster[:, 1], 
            color=colors)
plt.xlabel('Normalized Monetary Value in euros')
plt.ylabel('Normalized Predicted purchases in the coming 4 weeks')
plt.title('Normalized [Predicted purchase by TO]');


centroids = km.cluster_centers_
#centroid_coords = pca.transform(centroids)
plt.scatter(centroids[:, 0], 
            centroids[:, 1], 
            marker='X', 
            s=200, 
            linewidths=2, 
            c='#444d60');

plt.savefig('cluster_all.png')
In [82]:
label_colors = ["tomato", "grey", "limegreen", "gold", "slateblue"]
unique, counts = np.unique(y_pred, return_counts=True)
print(counts)
print()



cluster_count = pd.concat([pd.DataFrame(label_colors), pd.DataFrame(counts)], axis = 1)#.sort_values(by='')
print(cluster_count)
[270826  26996  14562  67091 113228]

           0       0
0     tomato  270826
1       grey   26996
2  limegreen   14562
3       gold   67091
4  slateblue  113228

[5 rows x 2 columns]
In [86]:
[(i/np.sum(counts))*100 for i in counts]
Out[86]:
[25.1664183631779,
 5.590955425052928,
 59.25793711387431,
 8.886719959060876,
 1.0979691388339778]
In [28]:
unique, counts = np.unique(km.labels_, return_counts=True)
counts
Out[28]:
array([121962,  27095, 287177,  43067,   5321], dtype=int64)
In [90]:
# Based on customer history, we can predict what an individuals future purchases might look like:
t = 4 #predict purchases in t periods
individual = summary.iloc[12386]
# The below function is an alias to `bfg.conditional_expected_number_of_purchases_up_to_time`
bgf.predict(t, individual['frequency'], individual['recency'], individual['T'])
Out[90]:
0.025169409509570598

Including transactional component

The model we are going to use to estimate the CLV for our userbase is called the Gamma-Gamma submodel

In [94]:
returning_customers_summary = summary[summary['frequency']>0]
returning_customers_summary = returning_customers_summary[returning_customers_summary['monetary_value']>0]
In [95]:
returning_customers_summary#[['monetary_value', 'frequency']].corr()
Out[95]:
frequency recency T monetary_value pred_purch_coming_week=12
SFID
0031i000001rNG6AAM 3.0 37.0 41.0 36.88 0.76
0031i000001rNGMAA2 1.0 13.0 68.0 8.02 0.13
0031i000001rNGRAA2 2.0 60.0 68.0 15.91 0.35
0031i000001rNGTAA2 2.0 36.0 68.0 52.46 0.30
0031i000001rNH5AAM 2.0 3.0 68.0 74.42 0.05
... ... ... ... ... ...
0035800001gxbOqAAI 12.0 65.0 68.0 30.62 1.86
0035800001gxbOuAAI 8.0 38.0 68.0 9.38 0.58
0035800001gxbWUAAY 10.0 64.0 68.0 13.82 1.55
0035800001gxbWXAAY 6.0 40.0 68.0 30.71 0.67
0035800001gxbWkAAI 2.0 34.0 68.0 10.31 0.30

117020 rows × 5 columns

In [96]:
from lifetimes import GammaGammaFitter

ggf = GammaGammaFitter(penalizer_coef = 0)
ggf.fit(returning_customers_summary['frequency'],
        returning_customers_summary['monetary_value'])
print(ggf)
<lifetimes.GammaGammaFitter: fitted with 117020 subjects, p: 1.86, q: 7.11, v: 92.53>
In [97]:
ggf.conditional_expected_average_profit(returning_customers_summary['frequency'],
                                        returning_customers_summary['monetary_value'])
Out[97]:
SFID
0031i000001rNG6AAM    32.35
0031i000001rNGMAA2    23.49
0031i000001rNGRAA2    23.55
0031i000001rNGTAA2    37.39
0031i000001rNH5AAM    45.71
                      ...  
0035800001gxbOqAAI    30.10
0035800001gxbOuAAI    14.86
0035800001gxbWUAAY    17.37
0035800001gxbWXAAY    29.82
0035800001gxbWkAAI    21.43
Length: 117020, dtype: float64
In [98]:
print("Expected conditional average profit: %s, Average profit: %s" % (
    ggf.conditional_expected_average_profit(
        returning_customers_summary['frequency'],
        returning_customers_summary['monetary_value']
    ).mean(),
    returning_customers_summary[returning_customers_summary['frequency']>0]['monetary_value'].mean()
))
Expected conditional average profit: 28.213612100866484, Average profit: 27.720022250879882
In [99]:
# refit the BG model to the summary_with_money_value dataset
bgf.fit(returning_customers_summary['frequency'], 
        returning_customers_summary['recency'], 
        returning_customers_summary['T'],
        returning_customers_summary['monetary_value'])
Out[99]:
<lifetimes.BetaGeoFitter: fitted with 117020 subjects, a: 0.21, alpha: 22.48, b: 1.32, r: 2.17>
In [100]:
clv = ggf.customer_lifetime_value(
    bgf, #the model to use to predict the number of future transactions
    returning_customers_summary['frequency'],
    returning_customers_summary['recency'],
    returning_customers_summary['T'],
    returning_customers_summary['monetary_value'],
    time=12, # months
    discount_rate=0.01,# monthly discount rate ~ 12.7% annually
    freq = 'W')
In [104]:
clv
Out[104]:
SFID
0031i000001rNG6AAM    108.28
0031i000001rNGMAA2      9.06
0031i000001rNGRAA2     43.58
0031i000001rNGTAA2     50.25
0031i000001rNH5AAM      5.08
                       ...  
0035800001gxbOqAAI    212.12
0035800001gxbOuAAI     30.48
0035800001gxbWUAAY    104.24
0035800001gxbWXAAY     73.40
0035800001gxbWkAAI     27.26
Name: clv, Length: 117020, dtype: float64
In [102]:
def normalize(df):
    result = df.copy()
    #for feature_name in df.columns:
    result = result.values
    max_value = df.max()
    min_value = df.min()
    result = (df - min_value) / (max_value - min_value)
    return result
In [106]:
normalize(clv).plot(kind='hist', bins = 100)
Out[106]:
<matplotlib.axes._subplots.AxesSubplot at 0x20db1fa2e08>

I am trying an RFM Segmentation, just for fun

In [115]:
quantiles = df_ml.quantile(q=[0.25,0.5,0.75])
quantiles = quantiles.to_dict()
In [310]:
segmented_rfm = df_ml.copy()
In [311]:
def RScore(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
def FMScore(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1
In [314]:
segmented_rfm['r_quartile'] = segmented_rfm['recency'].apply(RScore, args=('recency',quantiles))
segmented_rfm['f_quartile'] = segmented_rfm['frequency'].apply(FMScore, args=('frequency',quantiles))
segmented_rfm['m_quartile'] = segmented_rfm['monetary_value'].apply(FMScore, args=('monetary_value',quantiles))
segmented_rfm.head()
Out[314]:
club_card_number T recency frequency monetary_value predicted_purchases r_quartile f_quartile m_quartile
0 2245598906 59.000000 0.142857 1 17.99 0.000183 1 4 4
1 7004667809 56.000000 0.142857 1 35.98 0.000207 1 4 3
2 8002162035 76.285714 0.142857 1 24.99 0.000099 1 4 4
3 9002067672 78.571429 0.142857 1 14.49 0.000092 1 4 4
4 9002488707 56.142857 0.142857 1 43.97 0.000206 1 4 2
In [321]:
 
Out[321]:
0        144
1        143
2        144
3        144
4        142
        ... 
75121    442
75122    433
75123    441
75124    424
75125    431
Length: 75030, dtype: object
In [323]:
segmented_rfm['RFMScore'] = segmented_rfm.r_quartile.map(str) + segmented_rfm.f_quartile.map(str) + segmented_rfm.m_quartile.map(str) 

segmented_rfm.head()
Out[323]:
club_card_number T recency frequency monetary_value predicted_purchases r_quartile f_quartile m_quartile RFMScore
0 2245598906 59.000000 0.142857 1 17.99 0.000183 1 4 4 144
1 7004667809 56.000000 0.142857 1 35.98 0.000207 1 4 3 143
2 8002162035 76.285714 0.142857 1 24.99 0.000099 1 4 4 144
3 9002067672 78.571429 0.142857 1 14.49 0.000092 1 4 4 144
4 9002488707 56.142857 0.142857 1 43.97 0.000206 1 4 2 142
In [324]:
segmented_rfm[segmented_rfm['RFMScore']=='111'].sort_values('monetary_value', ascending=False).head(10)
Out[324]:
club_card_number T recency frequency monetary_value predicted_purchases r_quartile f_quartile m_quartile RFMScore
8676 2246637250 20.285714 3.285714 5 225.104000 3.022869e-04 1 1 1 111
3161 2245355339 75.857143 1.285714 4 208.227500 6.699403e-08 1 1 1 111
4895 2246478594 14.428571 1.714286 4 179.925000 8.070156e-04 1 1 1 111
2129 2244817740 78.857143 0.857143 4 178.532500 2.337558e-08 1 1 1 111
9093 2245836304 37.000000 3.571429 9 171.054444 7.376102e-09 1 1 1 111
12861 2135671010 66.857143 5.571429 8 157.046250 3.913895e-09 1 1 1 111
18726 2245705487 39.571429 8.000000 4 152.577500 7.331495e-04 1 1 1 111
10604 2245627625 56.285714 4.428571 4 150.830000 1.268408e-05 1 1 1 111
2014 2246807797 11.428571 0.714286 5 150.186000 8.828257e-05 1 1 1 111
15029 2245253836 68.857143 6.571429 6 149.100000 4.584027e-07 1 1 1 111

Comparing DeFacto & Orsay WM scoring

In [66]:
df_defacto
Out[66]:
SFID ScoringIdentifier Score
0 0031i00000J5bZkAAJ Spring Special 2020 0.19
1 0031i00000Hhdq9AAB Spring Special 2020 0.03
2 0031i00000J4DZWAA3 Spring Special 2020 0.05
3 0031i00000Pb315AAB Spring Special 2020 0.08
4 0031i00000PZjouAAD Spring Special 2020 0.06
... ... ... ...
8880164 0035800000ydPdBAAU Spring Special 2020 0.69
8880165 0035800000zHNDIAA4 Spring Special 2020 0.02
8880166 0031i00000PZY9dAAH Spring Special 2020 0.11
8880167 0031i00000LJtiXAAT Spring Special 2020 0.37
8880168 0031i00000PaGn1AAF Spring Special 2020 0.07

8880169 rows × 3 columns

In [81]:
df_orsay = summary.copy()
df_orsay.head(1)
Out[81]:
frequency recency T monetary_value pred_purch_coming_week=12
SFID
0031i000001rNG6AAM 3.0 37.0 41.0 36.88 0.76

1 rows × 5 columns

In [83]:
df_orsay.reset_index(level=0, inplace=True)
df_orsay.head(1)
Out[83]:
index SFID frequency recency T monetary_value pred_purch_coming_week=12
0 0 0031i000001rNG6AAM 3.0 37.0 41.0 36.88 0.76

1 rows × 7 columns

In [ ]:
merged_data = df_orsay.merge(df_defacto, on='SFID', how='inner')
In [93]:
df_orsay.shape
Out[93]:
(194134, 7)
In [92]:
merged_data.shape
Out[92]:
(178003, 9)
In [91]:
(p9.ggplot(mapping=p9.aes(x='pred_purch_coming_week=12', y='Score'), data=merged_data)
+ p9.geom_point())
Out[91]:
<ggplot: (-9223371895772317568)>
In [ ]:
 

mars 30, 2020

An Approach to Computing Predicted Purchase - Part 1/3

Predictive Purchase Computation in Python

Table of Contents

Background

Typical Data

Heuristic approaches

BetaGeo Model

References

Background:

If you work with transactional data, you probably have faced the challenge of computing meaningful KPIs that allow you to quantify how promising a given customer is / will be (based on past data). One computation that comes in handy when evaluating customers is the predicted purchase probability, i.e., a meaningful value that allows you to assess whether it might be worth investing in this customer, from a marketing perspective.

One might frame this questions as follows: given the purchase history of different customers, at the present moment, what will be the purchase pattern of these customers in a given future time window?
The drawing below attempts to illustrate the question's framework:

In [1]:
from IPython.display import Image
PATH = "/home/julien/data_blog/post_material/predictive_purchase/"
Image(filename = PATH + "purchase_history.png", width=500, height=500)
Out[1]:

This post is the first of a series of posts on the topic of predictive purchase.

Here, I use one method discussed in a paper published by Fader and colleagues in 2005. (Link at the end of this notebook)
Feel free to browse through the following posts if you are interested in reading about other models (hierarchical bayesian models) and exploratory analysis of model fits.

Important note
While I used this model in a business project, and it produced useful results that went into production, I am not an expert on financial computations, so take this as it is: an honest attempt to compute a useful metric! Keep critical and don't take this approach as text-book!

Import libraries

We first start by importing required libraries, and setting a few parameters for the notebook that I like to use. If you do not have these libraries installed, simply install them in your environment using pip install , or check the documentation online.

In [8]:
import os
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import plotnine as p9

import pandas as pd
pd.set_option("display.max_rows", 120)
pd.set_option("display.max_columns", 120)

display_settings = {
    'max_columns': 10,
    'expand_frame_repr': True,  # Wrap to multiple pages
    'max_rows': 50,
    'precision': 2,
    'show_dimensions': True
}

for op, value in display_settings.items():
    pd.set_option("display.{}".format(op), value)

from IPython.display import clear_output
import timeit

# Set figure size for easier reading
plt.rcParams['figure.figsize'] = (16,8)

Typical Data:

Let's import an example dataset of what transactional data could look like:

In [9]:
from lifetimes.datasets import load_dataset

df = load_dataset(
    'CDNOW_sample.txt',
    header=None,
    names=['customer_id', 'customer_index','transaction_date', 'quantity', 'amount'], 
    delim_whitespace = True,
    #converters=['date': lambda d: pd.to_datetime(d, format='%Y%m%d')]
).drop(columns={'customer_index'})

df.head(2)
Out[9]:
customer_id transaction_date quantity amount
0 4 19970101 2 29.33
1 4 19970118 2 29.73

2 rows × 4 columns

This is what a typical transactional (already clean ;) ) dataset would look like. We have access to three data features:
(i) date of transaction,
(ii) quantity purchased,
(iii) customer ID.

That is all you need to get started.

In order to train the model, you need to boil down a dataframe at the customer level with three parameters:
1) frequency: number of purchase for that customer.
2) recency: time between first and last purchase, in days, weeks or any meaningful scale for your business.
3) T: age of the customer (birth = first purchase)

This would be a so-called RFM table (Recency-Frequency-Monetary; note that I will be discussing the monetary component in a separate post). So to be more accurate, a RF table ;)

In [10]:
from lifetimes.datasets import load_transaction_data
from lifetimes.utils import summary_data_from_transaction_data

# Load transaction dataset
transaction_data = load_transaction_data()
# Transform to RFM table
rfm = summary_data_from_transaction_data(transaction_data, customer_id_col = 'id', datetime_col='date')
rfm.head(3)
Out[10]:
frequency recency T
id
0 0.0 0.0 298.0
1 0.0 0.0 224.0
2 6.0 142.0 292.0

3 rows × 3 columns

We can see that customer #2 performed 6 purchases, that his first purchase happened 292 days ago, and that there was a delay of 142 days between his first and last purchase.
The drawing below depicts the meaning of each parameter in the RFM table.

In [11]:
PATH = "/home/julien/data_blog/post_material/predictive_purchase/"
Image(filename = PATH + "recency_frequency.png", width=500, height=500)
Out[11]:

We can have a quick look at these features, see whether the distributions make sense

In [12]:
pd.plotting.scatter_matrix(rfm, hist_kwds={'bins':50});

A few observations:
(i) As expected from the data features, we can see that the majority of customers purchased very few times (frequency).
(ii) As a results of course, the recency feature has a major peak at 0, which contains all customers that purchased only one time from that business.
(iii) The business had some stability in acquiring new customers, as suggested by the uniformity of the age (T) distribution.

Heuristic Approach: RFM score

The dataset above is an example dataset of how transactional data might look like.

Now a very simple and easy approach to implement would be to calculate a so-called RFM score at the customer level. Data points from each data feature are categorized according to the distribution's quartile of that feature, which provides a score for each customer (in our case only RF score, since we do not have access to monetary value data).

In [13]:
quantiles = rfm.quantile(q=[0.25,0.5,0.75])
quantiles = quantiles.to_dict()

segmented_rfm = rfm.copy()
In [14]:
def RScore(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
def FMScore(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1
In [15]:
# Categorize data points according to quartile
segmented_rfm = segmented_rfm.assign(r_quartile = segmented_rfm['recency']
                                     .apply(RScore, args=('recency',quantiles)))
segmented_rfm = segmented_rfm.assign(f_quartile = segmented_rfm['frequency']
                                     .apply(FMScore, args=('frequency',quantiles)))
# Create RS score per customer
segmented_rfm['RFScore'] = (segmented_rfm.r_quartile.map(str) 
                           + segmented_rfm.f_quartile.map(str) 
                            )
segmented_rfm.head()
Out[15]:
frequency recency T r_quartile f_quartile RFScore
id
0 0.0 0.0 298.0 1 4 14
1 0.0 0.0 224.0 1 4 14
2 6.0 142.0 292.0 4 1 41
3 0.0 0.0 147.0 1 4 14
4 2.0 9.0 183.0 3 2 32

5 rows × 6 columns

We have produced, for each customer, a RF score that would already inform the business on which customers it might be worth focusing on.

However, while there is some virtue to this method (for instance interpretability, simplicity and speed of implementation), one might wonder whether it is wise to assume that past behavior maps onto future behavior that straightforwardly.

Instead, we could use machine learning to predict purchase based on past purchase patterns for each customer.

BetaGeo Model

Model specification

Pareto/NBD vs BetaGeo/NBD

From an historical perspective, the Pareto/NBD model has been an attractive model to compute predicted purchase in different business models. However, the complexity of the Pareto/NBD model made it quite complex for businesses to implement for regular update of predictive purchase indexes. As an alternative, Fader & colleagues proposed the BetaGeo/NBD model, which produces an arguably comparable fit to the Pareto, but is simpler in implementation.

We will first focus on the BG/NBD model and test its implementation on a public dataset. In a follow-up post (An Approach to Computing Predicted Purchase - Part 3/3), I will use the Pareto/NBD model on the same dataset, and explore it from another perspective using hierarchical Bayesian modeling to predict purchases.

In [16]:
from lifetimes import BetaGeoFitter
bgf = BetaGeoFitter(penalizer_coef=0.0)

I wrote down the assumptions for parameter distribution of the BetaGeo model, according to the original research paper.

Assumptions of the BG/NBD model

(i) While active, the number of transactions made by a customer follows a Poisson process with transaction rate lambda.
This is equivalent to assuming that the time between transactions is distributed exponential with transaction rate $\lambda$


\begin{equation*} \left(\ f(t_j | t_j-1 ; \lambda \right) = \left( \ \lambda e{}^{-\lambda(t_j - t_j-1) } \right) \left( t_j ; t_j-1 > 0\right) \end{equation*}

(ii) Heterogeneity in lambda follows a gamma distribution

\begin{equation*} \left( \ f(\lambda | r, \alpha \right) = \left( \ \frac{\alpha^r \lambda^{r-1} e^{-\lambda \alpha}}{\Gamma r} \right) \left(\lambda > 0\right) \end{equation*}

(iii) After any transaction, a customer becomes inactive with probability p. Therefore the point at which the customer “drops out” is distributed across transactions according to a (shifted) geometric distribution

\begin{equation*} \left( \ p(1-p)^{j-1}\right) \left(\ j = 1, 2, 3 ...\right) \end{equation*}

(iv) Heterogeneity in p follows a beta distribution

\begin{equation*} \left(\ f(p | a,b \right) = \left( \ \frac{p^{a-1} (1-p)^{b-1}}{B(a,b)} \right) \left( 0 \leq p \leq 1\right) \end{equation*}

(v) The transaction rate lambda and the dropout probability p vary independently across customers

Let's check whether the model has an acceptable performance to predict data. To do so, we will train the model on a training set and ask the model to predict data from a test set (AKA validation / holdout period).

You could do it yourself using SciKit Learn or a custom function, but Lifetimes proposed a utility function to split original transactional data (which is often the format available at different companies).

Model validation

In [17]:
from lifetimes.utils import calibration_and_holdout_data
transaction_data = load_transaction_data()

calibration_period_end='2014-09-01'
observation_period_end='2014-12-30'

summary_train_test = calibration_and_holdout_data(transaction_data, 'id', 'date',
                                                   calibration_period_end=calibration_period_end,
                                                   observation_period_end=observation_period_end)
summary_train_test.head(2)
Out[17]:
frequency_cal recency_cal T_cal frequency_holdout duration_holdout
id
0 0.0 0.0 177.0 0.0 120
1 0.0 0.0 103.0 0.0 120

2 rows × 5 columns

From the output above, we see that the function 1) splits the data into two sets and 2) produces a RF table for these periods.

In [18]:
# Fit model to the data to the test / calibration period
In [19]:
bgf.fit(summary_train_test['frequency_cal'], 
        summary_train_test['recency_cal'], 
        summary_train_test['T_cal']
       )
Out[19]:
<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.96, alpha: 2.26, b: 3.48, r: 0.19>

We run some standard diagnostics, where we plot the average number of transactions in the test set vs training set, for observed ("frequency_holdout") and predicted data ("model predictions")

In [21]:
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases
plot_calibration_purchases_vs_holdout_purchases(bgf, summary_train_test)

We can see that the model is quite good in picking up the purchase dynamics.

Most customers make low number of purchases, which is probably why we get a better fit at lower bins, compared to higher bins.

We could plot a model-free estimation of future purchases (or rather, model-based, the model being simple heuristics to estimate that value based on historical values).

See plot below: we would be overestimating purchases, which would impact our business in the future.

In [23]:
plot_calibration_purchases_vs_holdout_purchases(bgf, summary_train_test)
# Age in training set
age_train = np.mean(summary_train_test.T_cal)
# Age in testing set is the difference between the two time points taken to define test set
age_test = time_window = (pd.to_timedelta(pd.to_datetime(observation_period_end) 
                        - pd.to_datetime(calibration_period_end)).days)
# X axis limit from observed output
xlim=[0,7]
x_axis_no_model = np.linspace(xlim[0],xlim[1], 100)
no_model_line = (age_test/age_train) * x_axis_no_model

plt.plot(x_axis_no_model, no_model_line, linestyle='--', color='black', label='no model')
Out[23]:
[<matplotlib.lines.Line2D at 0x7fecec822b90>]

Now that we have validated the model using a validation set, let's train the model on the whole dataset.

Model prediction

In [186]:
from lifetimes import BetaGeoFitter
bgf = BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(data['frequency'], data['recency'], data['T'])
print(bgf)
<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>
In [187]:
# Model summary
bgf.summary
Out[187]:
coef se(coef) lower 95% bound upper 95% bound
r 0.16 3.99e-03 0.16 0.17
alpha 1.86 9.54e-02 1.68 2.05
a 1.85 1.30e-01 1.60 2.10
b 3.18 2.94e-01 2.61 3.76

4 rows × 4 columns

Lifetime provides some nice matrix plots, that allow you to visualize the probability of customers being alive given the parameters, and some other features of your dataset. I won't use them here but I recommend you explore these plots, they might have an added value when explaining the process to stakeholders.

Now that our model has been validated and fitted on the whole dataset, we can go on and predict the predictive purchase value of the customers in the dataset, for a given time window t:

In [190]:
# Predicted purchases are calculated for a given time window t
t = 12
rfm['pred_purch_coming_week=' + str(t)] = (bgf.conditional_expected_number_of_purchases_up_to_time
                                               (
                                                t, 
                                                rfm['frequency'], 
                                                rfm['recency'], 
                                                rfm['T']
                                               )
                                              )
In [195]:
# Plot the predictive purchase histogram
rfm['pred_purch_coming_week=' + str(t)].plot(kind='hist',bins=50)
plt.xlabel('proba of number of purchases in coming weeks=' +str(t))
plt.title('histo, customer count per proba of purchase');

We obtained a distribution of predicted purchases for each customer in the 12 weeks following the end of the data collection.
We can see that, since most customers performed low purchases, we have a strong peak at 0, with a right tail regrouping customers that we can expect will bring value to the business.



In the following post, I will explore the robustness of this model by playing with the data shape.

References

Original research paper from Fader et al. 2005: http://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf
Lifetimes library documentation: https://lifetimes.readthedocs.io/en/latest/